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FINITE ELEMENT COMPUTATION OP A VISCOUS COMPRESSIBLE 
FREE SHEAR PLOW GOVERNED BY THE TIME 
DEPENDENT NAVIER-STOKES EQUATIONS 

By 

Charlie H. Cooke ^ and Doris K, Blanchard^ 

SUMMARY 

A finite element algorithm for solution of fluid flow 
problems characterized by the two-dimensional compressible 
Navier-Stokes equations has been developed. The program is 
intended for viscous compressible high speed flows? hence, 
primitive variables are utilized. The physical solution is 
approximated by trial functions which at a fixed time are piece- 
wise cubic on triangular elements. The Galerkin technique is 
employed to determine the finite-element model equations. A 
leapfrog time integration is used for marching asymptotically 
from initial to steady state, with iterated integrals evaluated 
by numerical quadratures. The nonsyrametric linear systems of 
equations governing time transition from step-to-step are 
solved using a rather economical block iterative triangular 
decomposition scheme. 

Proof of concept has been accomplished by the numerical 
computation of a free shear flow. Numerical results of the 
finite-element method are in excellent agreement with those 
Obtained from a finite difference solution of the same test 
problem. 


^Associate Professor of Mathematical and Computing Sciences, Old 
Dominion University, Norfolk, VA 23508. 

High-Speed Aerodynamics Division, NASA Langley Research 
Center, Hampton, Virginia 23665. 
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INTRODUCTION 


Over the past two decades the finite element method has 
become a widely accepted tool for obtaining reliable numerical 
solutions to problems in structural mechanics. However, in 
fluids calculations the method can at best be regarded as rela~ 
tively untested, although its application is steadily diversi- 
fying. Recent endeavors include analysis of flow problems 
involving free surfaces, such as occur in groundwater seepage 
and oil depletion problems in the petroleum industry, which 
involve the nonlinear gas flow equations (ref, 1) ; wind driven 
lake circulation problems including islands (ref. 2) ; limited 
region modeling of mesoscale phenomena associated with the 
dynamics of ocean circulation (ref. 3); calculation of flows 
over nonlifting circular airfoils, based on the small disturbance ' 
nonlinear transonic flow equations of an inviscid compressible 
fluid (ref, 4) ; laminar three-dimensional boundary layer flow of 
a multicomponent compressible fluid (ref, 5) ? and pressure dis- 
tributions for confined flow problems (ref. 6), 

For the most part these investigations have consxdered low 
Reynolds number low speed flows. Sparseness in problem dimension 
and number of dependent variables (in effect, no more than two 
space dimensions and/or two dependent variables) has been a simpli- 
fying characteristic. The governing equations permitted symmetric 
equation solvers when the problems were implicit* The present 
investigation, to the best of the authors ' knowledge, is one of 
the first attempts to solve by finite elements a fluid dynamics 
problem characterized simultaneously by a variety of cumbersome 
aspects tending to severely complicate the numerical formulation. 
The flow is governed by the time-dependent nonlinear non-self 
adjoint Compressible Naiver-Stokes equations, in two space dimen- 
sions and three dependent variables. (The mitigating assumption 
of constant total temperature, i.e,, adiabatic jet mixing, per- 
mits the energy equation to be replaced by an algebraic equation.) 
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A sophisticated higher order element, cubic B-splines on triangles, 
is employed. The nature of the problem forces area integrals 
arising from the Galerkin formulation to be evaluated by numerical 
quadratures. The resulting systems of finite element equations 
are nonsymmetric , and the problem size dictated development of a 
linear system solver specially adapted to the characteristics 
of the problem. 

The goal of the present investigation is two- fold; first, the 
development of a reliable numerical tool for computing laminar 
flows which can be extended for use in testing fully two-dimen~ 
sional tiarbulence. models for a wide range of free shear flow 
applications, such as interference heating, separated flows, jet 
exhaust noise reduction, combustor design, and tangential slot 
injection. Second:, an assessment of the feasibility of using 
the finite element method as a tool for fluid mechanic calcula- 
tions is an expected product of the research. The same problem 
has been solved by various finite difference schemes, and compu- 
tational results are readily available (ref. 7) for comparison. 

In terms of computet core size requirements, as well as 
efficiency per computational step, the explicit finite difference 
methods afford the most economical approach to solving the time- 
dependent viscous compressible Navier-Stokes equations. However, 
for numerical stability very small time steps relative to spatial 
grid size are required. This mandates long computation time to 
reach steady-state, especially for flows characterised by fine 
ittesh spacing in regions of large gradients. As a result, the 
unconditionally stable implicit alternating-direction methods 
(ADI) , Hopscotch ADE methods, and MacGormack's method (ref . 8) 
have been fovind currently the most popular alternatives. Although 
the practical advantage of ADI methods over explicit methods is 
nowhere near that predicted by the Von Neuman analysis, experience 
indicates large time steps are allowed, although somewhat less 
than an order of magnitude larger than the explicit GEL limit. 


3 



The features of the finite element method which appear to 
enhance its attractiveness for fluid dynamics applications, 
and which are distinct advantages over other numerical schemes, 
include : 

1, Complex boundaries and boundary conditions can be easily 
and effectively treated. For example, by using higher order 
elements which carry function and derivative values as unknowns , 
both function and derivative boundary conditions can be treated 
homogeneously, without extra boundary error generated by writing 
finite difference derivative approximations. Also, triangular 
elements allow more precise treatment of complex geometry, and 
even further precision is obtained through isoparametric elements 

2 . Mesh refinement in regions of large gradients is easily 
implemented. 

3 , The approximations for a particular problem are more 
flexible. These are reflected in freedom of choice over element 
shape, size, and order of approximation in each element. 

4. Time steps at least as large as allowed by ADI are 
permitted. 

In general, it can be said that the approach affords a 
fairly automatic scheme which does not leave much flexibility 
for manipulation of the mechanics of the algorithm, i.e,, the 
kind of differencing for individual terms is not readily extern- 
ally visible. 

On the other hand, it is clear that the advantages of the 
method, flexibility and reliability, come with a price attached. 
For example, the complexity of implementation of the method 
entails much more sophistication of computer code, longer 
development time, more manhour ard machine hour expense, etc. 
Aside from development of the fundamental numerical algorithm, 
myriad data manipulation . and supporting software programs need 



be produced for effective use of the method. Moreover, the 
execution time per computational step is relatively expensive. 

FIiUID DYNAMICS MODEL OP A 
FREE SHEAR FLOW 

A fluid dynamics problem whose description requires the 
complete Navier-Stokes equations is free mixing flow with no 
dominant flow direction. Such a problem occurs in the mixing 
of a supersonic jet with an imposed crossflow. This prob- 
lem would embody some of the complications often unavoidable in 
practical calculation of flows for real vehicles , such as sharp 
corners, truncated computational domain, computational boundary 
conditions at artificial downstream boundaries, and boundary 
conditions for mixed parabolic-hyperbolic flow. Such a problem 
has been attempted by finite difference methods (ref. 7), and 
it is intended to apply the finite element program for future 
comparisons of the solutions to this problem. 

However, since the individual effects of such complications 
are hard to isolate, the pioneer application of the finite element 
program developed under NASA contract NASl-11707-37 is to the com- 
putation of a free shear flow generated by the parallel mixing 
of two supersonic jets, initially separated by a thin splitter 
plate. Solution of such a problem does not in this case require 
the full Navier-Stokes equations, since fairly accurate results 
can be obtained using the quasi-parallel assumptions of parabolic 
boundary layer theory (ref, 9) . However, the availability of splu 
tions generated by several computational methods affords a ready 
basis for evaluation of the finite element method. ■ 


Plow Field Configuration 

The flow field configuration of the problem considered is 
shown in figure 1. The computational domain begins downstream 


from the base of the splitter plates. For the test case presented 
in the present paper the jet Mach numbers are 3 and 1.68 for 
Re = 1000. 

Governing Equations 

Steady-state flow is asymptotically approached through solu- 
tion of the time-^dependent Wavier- Stokes equations. The assump- 
tion of constant total temperature (adiabatic mixing) and two- 
dimensional flow yields the following non-dimensional systems 
of governing equations: 

Continuity : 
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Temperature relation ! 
T 1 - - v^ 


( 4 ) 





In equation (5) , U/T dimensionless, although T and the 

constant 198.6 (Sutherland's constant) are expressed in degrees 
Rankine. The variables used to non-dimensionalize eqs, (1) to (6) 
are presented in reference 7 . 

Throughout this paper, the notation and f. denote 

quantities associated with the x,y directions ; the convention 
for first partial derivatives v;ill be 


9f » 
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Boundary Conditions 

Boundary conditions for the problem are shown schematically 
in figure 2, Function specifications are given for all three 
variables on the inflow? symmetry conditions apply at the bottom, 
and on the top function specification is made for velocity, zero 
normal derivative for density. On the outflow a computational 
boundary condition is applied; this could be either linear or 
quadratic extrapolation. The downstream boundary condition 
computations will be discussed in detail in a ‘subsequent section 
of the present paper. 

FINITE ELEMENT APPROXIMATION 

Our approximation to the fluid dyhamics problem lequations 
(1) to (7) and boundary conditiGns of figure 2] is obtained by 
applying the classical Galerkin (or method of weighted residtais) 
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in conjunction with finite elements. The first step is to tri- 
angulate the computational domain (2 with boundary Vf and 
then consider piecewise polynomial approximating (trial) functions 
on this grid. 


Trial Functions 

For purposes of illustration, the trial functions for approxi- 
mating density variations are of the form 

N 

p(x,y,t) =2 Pj(t) (f)j(x,y) . (8) 


Here N is the total number of nodes. The (B-spline) functions 
{i|)j} comprise a local and interpolating basis for functions of 
the form equation (8) , which are continuous and piecewise cubic 
polynomial on with sectional ly continuous first partial 

derivatives, which are infinitely differentiable cubic on 
the interior of each triangle. (For a more precise description 
of the B-splines, see reference 10). 

The weights Pj("h) are chosen by Galerkin’s methods. Thus, 
the final approximating function satisfies all boundary and initial 
conditions at problem nodes, and approximately satisfies the 
governing equations over the domain. For each trial function 
(density and two velocity components) there are ten nodes per 
triangle; triple nodes at triangle vertices, and a single node 
at the centroid (see fig. 3). The parameters pj each associate 
with a distinct node, and represent approximations to function 
and first partial derivative values (p, P P „) at vertices, 
and function values alone at the centroid. 

Time Discretisation 

For simplicity, the time discretization will be indicated 
only for the pseudo-viscous continuity equation 



(9) 
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p U = e p , 


where D is the velocity vector. (For e non-zero but small, 
artificial viscosity is added to the continuity equation.) 

The weak form of (9) is obtained upon multiplying by an 
arbitrary function and integrating over fl. Applying Green's 
theorem to remove the second derivative from the equation one 
obtains 
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where is the normal derivative on F (positive in the out- 

dN 

ward direction) , 

The time discretization chosen employs central differences 
on time derivatives and time averages of space gradients in p« 
Denoting the time step by t and the time index by n(t^ = t^ + nt) , 
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If 4'j were the B-sp line associated with a node at which 
Pj is unknown (no boundary condition applies) , substitution of 
eguation (8) into (11) would yield the finite element equation 
corresponding to the density variable at this node. The velocity 
equations are obtained by a similar procedure. The chosen time 
discretization has the advantage of yielding (implicit) linear 
systems of finite element equations, second order in time and 
fourth order in space, except at the downstream boundary, where _ 
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the second order accurate quadratic (or first order accurate linear) 
extrapolation boundary condition applies. 


Local Equations 

On a given triangle SI the trial function for density may 


be written as 


^ ... 

Pg(K,y,t) = = 5^ 4 , 


where the p„ are density parameters associated with nodes on 
this element, and are the corresponding B-splines, with 

—T , ^ 

P = (Pl r V2t • r Pi o) 

( 13 ) 
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the time discretized finite element equation contributions from 
the triangle are as indicated below. A more detailed deri- 

vation of these equations is presented in reference 10 . 




continuity equation ~ Contribution from a single element 
(No artificial viscosity) 
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element may be obtained fxom the x-momentniri equations by inter*r 
Ghanging x and y, then reversing the algebraic signs of the 
boundary integral teanns. These equations are asymmetric in x 
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and y, when v?ritten with the boundary terms changed to area 
integrations using Green's theorem for the plane. 

Boundary integrals vanish when is not associated with 

a boundary node. Pflien is so associated, all boundary inte- 

grals over portions of element boundaries which are not coinci- 
dent with region boundaries cancel in pairs. 


Numerical Quadratures 

The iterated integrals of equations (IS) to (16) are in all 
cases inconvenient, and in the case of nonlinearities impossible, 
to integrate without nxamerical quadratures. The question of what 
quadrature scheme to use, and what order of accuracy it must 
possess, now arises. 

If the order of accuracy is insufficient, the overall accur- 
acy of the method is lower than otherwise obtainable with cubic 
elements. Hence one very redeeming feature of the ciabic element 
is vitiated; the same accuracy could be achieved more economically 
from use of lower order elements whose accuracy matches that of 
the quadrature scheme. 


The question of proper order of the quadrature-scheme for 
non-degradation of tlxe built-in accuracy of the finite element 
algorithm has been investigated by Fix (ref, 11) . For the present 
case, the quadrature scheme should be exact for two-dimensional 
polynomials of total degree six. 


One quadrature scheme which meets this requirement is a 
16-point scheme of the form 

■ 4 

f (5,n)d5dn = jwjj^ + f(5j, - 


+ ( 1 ) 


( 17 ) 


where the quadrature points and weights are found in table 

IV, page 134 of reference 12, However, the accuracy requirement 



is overfulfilled, and the effieiency of the algorithm suffers 
in this aspect. 

Here the integration is over a standard triangle, with 
vertices of (0,0), (1,*-1) , and (1,1). Integrals in (x,y) 
space are obtained from the above upon multiplication by the 
magnitude of the Jacobian determinant of the appropriate affine 
transformation which maps an element onto the standard triangle. 

When derivatives appear in (x,y) space integrands, the corres- 
ponding (?,Ti) space terms is obtained by the chain rule. 

For line integrals, transformation to the standard triangle 
yields an integral , 

I f (l,n)dn (18) 


which can be evaluated sixth order accurate with a third order 
Gaussian quadrature scheme. 

Global Equations 

The linear systems describing time transition of the numerical 
solution, which result from global assembly of equations (15) to 
(16) , are 


Continuity 
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Momentum 
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The vectors p, u, v contain all unkricwr?. nodal density 
and velocity variables? indices indicate time dependency of 
the corresponding computations. 

The matrices D, zz, zR, Rz, RR «ire nonsymmetric and variable 
banded, with symmetric profile. Velocity associated matrices each 
have identical profiles prior to allowance for any matrix fill on 
LU decomposition. After such allowance zz, RR aiid zR, Rz 
are respectively pairwise identical in profile. Storage and 
handling of these matrices is discussed in Appendix B, under 
mesh generation and mesh associated data arrays. 

Matrices employed in the solution process are triangularized 
by an LU decomposition routine, especially coded to account for 
lack of symmetry and matrix profile. The continuity equations 
are solved by the usual LU techniques. However, a special 
equation solver, the block iterative LU solver, was developed 
for momenttom solution {see Appendix A). This hybrid method 
employs the previously described LU routine, along with an 
iterative procedure, to accomplish the momentum solution with 
decomposition only of zz, RR, 

Computational Boundary Conditions 

For flow problems in which the extent of the computational 
domain is dictated by storage restrictions of the computer, 
closing the problem often involves the enforcement of an arti- 
ficial boundary condition on a truncated problem domain. The 
forced vanishing of a higher order derivative, although seemingly 
unnatural/ is a condition often applied in computational fluid 
dynamics (ref. 8). 

For example, the motiyation behind linear extrapolation as • 
a downstream continuation is the idea that sufficiently far 
downstream the flow variables should vary linearly with outflow 
direction* This would imply a vanishing Second derivative in 
the vicinity of the outflow boundary. Should one force instead 
the vanishing of third derivatives, the result is quadratic 
extrapolation. 
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Consider a finite element triangle oriented with one edge 
aligned with the outflow boundary and one edge perpendicular 
to it, with vertices at points ^ 


^^n+l' ^m+1 


) with x_., = X + h. 
n+1 n 


A cubic trial function 


on this triangle has the form 


f (x,y) = A + By + Cy2 + Dy^ + (E + Py + Gy2)x 

4 

+ (H + ly)x^ + Jx3 . 

The condition (trapezoidal rule) 


*n+l = *a + 2 [(11)^ + 


forces J to vanish? it represents the finite element counter- 
part of quadratic extrapolation, as .applied in finite difference 
settings . 

The counterpaart of linear extrapolation is not so easily 
achieved. For example, the extra condition 
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yields the (Euler integration) formula 


*n±l = ^ Kf 
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Conditions (22) and (23) force J, together with H + ly , to 

m 

vanish* This produces linearity over the triangle base , but not 
at other points. 


True linear ..extrapolation would require .1 = 0 as well; 
a condition not naturally enforceable. Linearity of the trial 
functions near the outflow could only be achieved by employing 
a patch basis (the replacement of cubic functions by linear 


( 21 ) 


( 22 ) 


(23) 


(24) 



basis functions iu this region) , which would somewhat complicate 
the program. In the absence of this extreme, the coefficient 
I(y ~ y^) of (21) can be kept small, and linearity approxi- 
mately achieved, by using a fine mesh grading in the y-direction. 

It is concluded that quadratic extrapolation is the more 
natural of the two. Further support for this conclusion is pro- 
vided by a Taylor's series analysis; even were true linearity 
achievable, the quadratic extrapolation affords an extra order 
of accuracy. This is particularly appealing, inasmuch as 
inaccuracy of the extrapolation can build a numerical boxmdary 
layer of error. 

Furthermore, from a physical point of view, near-boundary 
distortion to some degree is to be expected from linear extrap- 
olation, due to enforced outflow derivative eguality. While 
linearity may in some cases be a physically well-grounded assump- 
tion, one would not expect it to hold in a shear layer, even in 
steady state. Thus, non-normal diffusive forces are to be expected, 
since derivatives are unnaturally modified. 

Finally, the most compelling argument in favor of quadratic 
extrapolation is the following: As clearly seen from equations 

(22) to (24) , were linear physics present, the quadratic extrapo- 
lation does not preclude its being modelled; whereas, in the 
opposite circumstance, the linear extrapolation can indeed force 
unnatural distortion, the degree of which will depend upon near 
boundary mesh refinement. 

Comparison of Results for FEM and ADI Computations 
of a Free Shear Flow 

The finite element code initially developed hander contract 
iilASl-11707-37 was first applied to the gei mixing problem des- 
cribed in reference 7. However, program storucture prohibited mesh 
refinement extensive enough to resolve the flow, resulting from 
in-core storage of all system matrices. As a consequence, the 
code has been reconstructed to provide greater mesh refinement 
capability. Solution of the continuity equations is completed 
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prior to contmencing assembly of the momentiam equations ^ which 
frees space occupied by the D~matrix. During momentxHn assembly, 
further core is made available by building up the zR and Rz 
matrices on a disk storage device. The inconsummate amount of 
manual labor associated with providing the finite element tri^,. 
angles and mesh associated data arrays is now replaced by the 
automatic mesh generation package discussed in Appendix B, 

Due to suspicions concerning the physically well-posed 
nature of the "wall- jet" problem, by decision of the technical 
monitor the first application of the restructured code has been 
fluid dynamic calculations associated with a parallel two stream 
mixing (see 'figs, 1 and 2) . In this section numerical results 
are presented, as well as comparisons with the ADI solution for 
the same physical problem (ref. 7) . The major difference in 
problem formulation between the two methods concerns computa- 
tional boundary conditions; quadratic extrapolation for the 
finite element method (FEM) , as opposed to linear extrapola- 
tion for ADI. 


Differences in Domain 

The ADI method was applied on a 10 x 122 grid point uniform 
rectangular mesh, with iinit increment h = .025. For FEW pur- 
poses, economy was achieved by truncating at the top, leaving 
a 10 X 100 grid point mesh. However, the final FEM mesh was 
much coarser . 

The 10 X 100 mesh was triangulated as follows: First, a 

coarse rectangular grid was superimposed; then each rectangle 
was dissected to form two triangles. In the x-direction the 
grid was rather coarse; increments were respectively of lengths 
3h, 4h, 2h. In the y-direction, mesh increments were 3 (3h) 

spacings; 22(2h) spacings; l(3h) followed by 9 (4h) spacings|. 
ending with l(3h) and 2 (2h) spacings. The result is a non- 
uniform mesh, more refined in the shear layer; hear the outflow 
(to afford more accuracy for the quadratic extrapolation) ; and 
at the top. 



This triangulation produced a 4 3S 39 rectangular mesh yielding 
228 triangles/ characterized by a maximum of 672 possible unknowns 
per dependent variable (see fig. 3 for placement of FEM unknowns 
on a triangle) . Actually, application of boundary conditions 
reduces the maximum munber of unknowns per .dependent variable 
to 612. In contrast/ for this same domain and the uniform mesh, 

ADI involves approximately 900 unknowns per dependent variable. 

Boundary Condition Differences 

Along the top, steady state converged ADI function values 
were employed as boundary values for FEM calculations. Except 
for the type of downstream continuation, other boundary treatment 
and function boundary values were identical. Derivative boundary 
values were obtained from a spline fitting routine. 

For initial flov? field, interior values were obtained by 
linear interpolation, with function values of the outflow 
differing by as much as 50 percent in function (and much 
larger in derivative) values from the expected steady-state 
results. 


Artificial Viscosity 

An artificial viscosity terra characterized by e = ,0001 
[see eg. (9)3 was addend to the continuity equation. Since a 
stability analysis (ref, 10) indicates marginal stability for 
the FEM formulation of the continuity equation, this theoretically 
advisable but practically negligible safety factor was deemed 
necessary. An order of magnitude larger e is required to pro- 
duce noticeable changes , and two orders of magnitude larger to 
produce maximum one percent changes, in ten steps; hence, for 
practical purposes it would appear the artificial viscosity can 
be ignored (none was used in the corresponding ADI Calculations) 

Numerical Results 

Steady state results, for computations with Re = 1000, 
e = .0001, and quadratic continuation, were essentially, achieved 



with the following sequence of time step sizes 14{.001)r 
7 (.002), 35 (.005), 49 (.01), for a total of 108 iterations. 

The program * was allowed to run further, and numerical steady- 
state results are presented after 186 steps. The maximum step 
tolerable has not been determined, although mote recent compU’- 
tations appear reasonable when a step size of .03 is employed. 
However, catastrophic divergence occvurs at step sizes of around 
10 CFL. [The CFL (explicit) stability limit for the SDI mesh 
is approximately .023.] Thus, it would appear steady state 
could be reached in fewer steps, by consistently using the 
maximum tolerable step. 

Velocity vector plots for the FEM flowfield, evaluated at 
triangle vertices, are presented in figure 4. To gain some 
idea of the comparative density of FEM and ADI grid points, the 
FEM flowfield is interpolated to the ADI grid using the cubic 
finite element model of the solution (see fig. 5) . 


Figure 6 shows steady state velocity and density variations 
at stations ki = .075 and X 2 = .175 in the flow. The pressure 
is constant at .00389 over the field, with maximum deviations 
of 5 X 10“5. 

Table 1 shows actual nximerical differences between the final 
steady state FEM and ADI computations. Steady state was defined 
by the convergence criterion 


where Af^^ ^ ^n+1 ~ ^n ^ time difference and f = p, u, or 
v. Table 2 and figures 7 to 9 present percent differences between 
the two calculations , 


% difference - 


FEM - ADI 
ADI 


X 100 



with ADI results as base. As expected , extreme differences 
occur in the mixing region, where boundary condition differences 
on the outflow have most effect. 

Due to the fact that the normal velocity component was zero 
or near zero over relatively large regions of the field, lack of 
enough significant figures of accuracy contributes to misleading 
percent differences in data for this component unless this is 
taken into accomt. Therefore, in computing this data the percent 
difference was automatically set to zero if either 

1 vj 5 X 10~? 

or the actual difference satisfied 
|FEM - ADl|^ < 10“5 . 

Thus, only four significant decimal places of accuracy are assumed 
in the FEM solution? this appears to be a realistic assumption, 
from the manner of approach to steady state. . 

The relative rates of convergence for the two methods are 
shown in figure 10, which exhibits transient behavior of the 
fluid dynamics variables of selected field points above, below, 
and within the shear layer. The time to convergence for the 
two methods appears to be almost identical, with the FBM algorithm 
exhibiting the less smooth approach to steady state. This we 
possibly may attribute to the non*-uniform time step and noii- 
uniform mesh employed by FEM, versus the constant time step and 
uniform mesh used for ADI [the ADI calculation consistently 
employed a time step of (CFL -) .023373. 

Cpst Comparisons 

The ADI code requires machine core storage of 107 Kg for the 
1220 node (10 x 122) mesh (this core requirement would change 
little had the truncated domain been used in the computation) . 

For the truncated problem, the FEM code requires 236 Kg words 

■ . . . 21 


of storage. Moreover, using the full ADI domain would require 
approximately 20 percent more core, since the FEM core size is 
problem dependent. ^Phe CDC-6600 CPU time for ADI is .00374 
sec/node and FEM requires .229 sec/node, for each time step. 
Additionally, the FEM code utilizes .8 sec/node of PPU time 
per step , for disk reading of time invariant data (which other- 
VT^ise would require additional CPU time per step) and disk buildup 
of system matrices zR, Rz . 

To provide some idea of where FEM CPU time is used, normal 
data printout and assembly of system matrices requires approxi- 
mately 86.5 percent and equation solving 13,5 percent of the time 
used per step. Since over 1800 linear equations characterized by 
large bandwidth nonsymmetric matrices are being solved per step, 
it appears any inefficiency of the code occurs in that portion 
devoted to equation asserttoly. 

Alternate Program Design 

In restructuring the program design of the finite element 
code produced under contract NASl-11707-37, two versions of 
code which implement the basic numerical algorithm and which 
provide greater mesh refinement capability have been produced. 
Version I incorporates some minor design alterations and program 
optimization performed by personnel of Computer Sciences Corpor- 
ation, and is structured so that only one of the matrices D, 
zz, zR, Rz, RR occupies core at one time. As equation 
assembly proceeds the density and velocity local stiffness 
matrices for each element are assembled in parallel. In turn 
each global matrix is read into core from disk, local equation 
contributions are globally distributed, and the matrix is written 
back. As a result of these huge data voltmie disk writes (say, 
20,000 words per write) , peripheral processing time is exorbitant, 
and the result is extremely poor throughput, caused by excessive 
roll-outs of the program due to its heavy demands on system 
resources. For example, in one run preceding a moderate number 
of time steps, day file entries stretched over a twelve-hour 


period, with around 15 roll-outs> although total dollar cost 
quoted in the day file was around $200. (PPD time carries no 
charge on this operating system other than through o/s calls,) 

Design philosophy of Version II , programmed by the authors, 
centered around the goal of minimum core requirements as well as 
minimum data volume at each disk write. The outcome is greatly 
improved throughput, due to decreased data volume per disk write 
(a maximum of 220 words/write except for a few isolated large 
volume writes per step) , at the expense of a greater number of 
o/s calls and slightly greater core requirements. In version II 
continuity assembly and equation solution is disposed of, without 
disk storage of any data involving the D-matrix, prior to com" . 
mencing assembly of the momentum equations. During momentum 
assembly, the matrices zz and RE. are built up in core, with 
local element stiffness matrices for zR and Rz written to disk 
triangle by triangle, each record of length 200 words. In-core 
global matrices are then written to disk, element stiffness 
matrices are read, and zR, Rz are assembled in the space 
vacated by zz, RR. The global matrices zR, Rz are then written 
to disk by rows, one record containing one row each from zR and 
Rz, zz and RR are then read from disk (an isolated large vol- 
ume disk operation) and during equation solution the rows of zR, 

Rz are read as needed, 

A comparison of the relative merits of the two versions is 
provided below with reference to the 228 triangle mesh. It 
should perhaps be remarked that the data below is approximate, 
based on a few runs of duration four time steps each. 

o/s Calls, Version II requires twice as many as version I, 
at approximately .135/node per step, on a normal run. Depending 
upon the number of programs in .the system simultaneously competing 
for .the same disk device, PPU time for version II can vary by as 
much as 200 percent, for the same number of time steps. 

Gore storage - 165 Kg, version I, as opposed to 236 Kg, 
version II . 
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3;hroughput, Excessive roll-outs for version II, around 10 
to 1 compared to version II . This can mean an extra day of 
elapsed time in getting the program back to the programmer, 
at some times two or three days turn around time (without 
priority). 

Dollar Cost . Total cost indicated in the day file shows 
version II twice as dollar-wise expensive, due basically to 
increased core and twice the number of o/s calls. 

CPU Time . Approximately the same, 

PPU Time . Approximately three times as much for version I 
as for version II. 

As a result of program throughput and excessive demands on 
system resources of version I, version II was chosen, after 
conference with personnel of the Analysis and Computation Division 
at Langley Research Center, as the vehicle for steady state numer- 
ical solution of the free shear flow problem. 

CONCLUSIONS 

Under contract NASl-11707-37 and NGR 1098 a finite element 
algorithm for solution of fluid flow problems characterized by 
the two-dimensional Navier-Stokes equations has been developed. 
Proof of concept was provided by the calculation of primitive 
flow variables for the free shear flow problem, which provided 
excellent numerical results in comparison to the ADI method. 
Unfortunately, the algorithm places heavy demands on computer 
system resources, in terms of CPU time, core storage, PPU time > 
and o/s calls. Thus, total dollar cost of executing the algorithm 
appears extravagant. Moreover, due to algorithm complexity as 
well as the host of data manipulation and supporting programs 
essential in minimizing human labor, development time is much 
inexcess Of that normally expected in finite difference 
applications . 

The critical question in evaluating the method is how much 
performance improvement might be expected by alternate program 


designs alternate analytical formulations, such as the use of 
a simpler element. Versions I and II of the program give an 
indication of how simple trade-offs in system resources supply 
drastically different performance in terms of dollar cost 
opposed to less eKcessive demands on system peripheral devices 
and improved program throughput. 

Considering the present program design, it appears that 
the method used for solving the large systems of equations that 
arise is by far more economical than other Icnown direct or in- 
direct linear system solvers for equation classes as general as 
those presently involved. Moreover, the total I/O design of the 
FEM code as well as economy in core storage through use of disk 
devices is enabled by the solver features which allow system 
matrices to be processed by parts during equation assembly and 

solution* 

* 

However, one might reasonably expect that alternate program 
designs or other analytical formulations could yield further 
economy in terms of equation assembly time. For example, one 
factor that stands out as a contributor to inefficiency is the 
16-point quadrature scheme used for evaluation of integrals over 
a triangle. Theory predicts the quadrature should be exact for 
two-dimensional polynomials of total degree six in order not to 
vitiate the accuracy of the method achievable from ciobic elements 
(ref. 11). Further theory predicts the existence of eight or nine 
point quadrature schemes possessing this degree of accuracy . 

(ref, 13). Unfortunately, mathematicians have yet to synthesize 
such a scheme? research is lagging practical needs in this area. 
Thus, one may perceive significant improvement in algorithm per- 
formance could such schemes be found, since there is abundant 
use of quadratures throughout the code. 

By way of alternate analytical formulations, there exist 
cubic elements on rectangles (ref. 14) which could equally well 
replace the present cubic elements on triangles. Moreover, 

9-point quadrature schemes for rectangles of the necessary degree 
of accuracy do exist, in practice as well as in theory. One might 


25 



fault the general applicability of rectangular elements for regions 
with curved boundaries? however, this seems an irrelevant point 
since fairly good isoparametric rectangular elements are known. 

On the other hand, one might conjecture that for a problem 
of the present complexity, the simpler should be the approach 
to its solution. Hence, linear elements come into consideration. 
The matrices arising could be expected to be of larger dimension, 
but bandwidths should be significantly decreased; one could pre- 
diet that equation storage would be somewhat comparable. More- 
over, at lower degrees of accuracy such as is needed with linear 

* 

elements, more efficient quadrature schemes are available. 

In summation, the finite element method developed appears to 
have excellent prospects in fluid mechanic applications, with 
respect to convergence to the steady solution and accuracy of 
the final results. In terms of complexity of implementation 
and computer resource demands and costs, as presently formulated 
the finite element method does not compete with standard finite 
difference techniques, A critical evaluation of the applicability 
of the method iu fluid mechanics would require one to decide how 
much more favorable would be alternate program or analytic design 
in the method of implementation, as well as a determination of 
what particular types of problems presently difficult for the 
finite difference approach might readily yield to finite elements, 
such as high Reynolds number flows and complex geometry problems. 



APPENDIX A 


A BLOCK ITERATIVE LD SOLVER WEAKLY 
COUPLED FOR LINEAR SYSTEMS 

INTRODUCTION 

In some fluid dynamic applications of the finite element 
procedure the governing systems of linear equations arising from 
the numerical analysis are weakly coupled between distinct sets 
of flow variables. Solving such systems by the usual fixed band 
algorithms results in excessive program execution times, in par- 
ticular for large ncnsymmetric matrices and the time implicit 
asymptotic approach to steady state solution. However, alternative 
equation solvers may be developed which exploit the weakness of 
the coupling to produce significantly decreased equation solving 
time. 

Assuming such equation solving techniques are applicable, 
the natural implicitness of the finite element method becomes 
less of a setback? the method then appears more competitive with 
the well established finite-difference techniques . Moreover, 
such equation solvers could be applied in finite difference 
settings as well, in which case higher-order-accurate implicit 
numerical schemes which for efficiency do not rely on the tri- 
diagonal systems arising from ADI (ref. 8) and spline interpola- 
tion methods (ref. 18) could evolve. 

In this appendix a technique which shall be called the block 
iterative LU solver is outlined. The efficiency of the method is 
dependent upon the supposition of a weakly coupled linear system; 
hence, its lack of generality in application does not grant a 
cure-all for the large non— sparse nonsymmetrie system problem. 
However, where weak coupling is present the execution speed of 
the block solver makes it an opportune method for solving large 
linear systems. Furthermore, such systems appear frequently 
enough in scientific applications to warrant some general attention 
being devoted to this hybrid technique. 
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WEAKLY COUPLED LINEAR SYSTEMS 
Consider the linear system 

AZ = h / (A-1) 

where A is an n-square matrix, with Z and b n^-dimensional 
vectors . Partitioning the vector ^ to obtain a vector X which 
has p components and a vector Y with g components, p + q = n, 
induces the partitioned system? 


I 

1 



I 

I 





(A-2) 


Definition 1. The system (A~2) is said to be weakly coupled 
between the X, Y sets of variables if the quantities 

Cjj = //C// 

Cy = y/E-v/ y/D// 

are small compared to unity* 

Here we use the usual Euclidean norm //x//g for vectors,, 
and the compatible matrix norm 


//M// = max //MX//g 
//X//^ =1 

Since 


{A~4) 




^x - //B// 


and 



> 


//D// 
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ths essential content of this definition is the intuitive idea 
that the system (A-2) is weakly coupled provided the elements 
of the matrices C, D are small compared to those of B, E. 

DESIGN OP THE BLOCK ITERATIVE SOLVER 

Consider the following equations defining an iterative 
solution of equation (A-1) : 

Bx^+1 = bi - CY^ 

k=l/2f3,...n (A— 6 ) 

= b2 - DX^ 

where B, E are assumed invertible, A sufficient condition for 

convergence of the iteration is that the coupling coefficients 

C , C of (A“3) each be less than unity; that is 7 the condition 
X y 

required for weak coupling. 

Compared to direct methods of solution, the efficiency of 
this iteration is dependent upon the sparseness of C and D; 
the method of inversion of B, E; the closeness of the initial 
vector to the true solution of the system; and the weakness of 
the coupling. Of these, the method of inverting B,E is 
usually the only controllable influence. The block iterative LU 
solver proposed here is thus to be comprised of the block Jacobi 
iteration (A^-e) using an LU decomposition for the inversion ‘ (once) , 
of B, E,. . with subsequent front and back solves at each itera^- 
tive step. (Of course, the matrices B, E must be sufficiently 
diagonal dominant to allow the LD decomposition.) 

A tandem advantage of such a solver , in addition to speed 
considerations , is that for large systems the problem splits 
naturally into pieces which may be stored in core or on external 
1/0 devices. For example, the matrices B, E may be retained 
in core, and G, D row stored on disk, to be read as needed 
When forming the right members of (A" 6 ) . For large systems this 
factor alone cOuld dictate the choice of the hybrid solver. 
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PROGRAMMIUG RESULTS FOR A SPECIFIC APPLICATION 


Weakly coupled linear systems of the form (A"2) , with 
p =; q = S, arise in certain' time implicit finite element models 
for solution of the two-dimensional viscous compressible Kavier- 
Stokes equations in primitive variables , such as the present, 
where the assumption of constant total, temperature removes the 
necessity of solving the energy equation. Here system (A-1) 
represents the finite element modelled momentum equations, and 
X, Y are the collections of u, v velocity variables associated 
with mesh points. The finite element algorithm is such that ele- 
ments of the Cr D matrices contain a factor of times tep divided 
by Reynolds number, x/Re, not included in the elements of B, B* 
Weak coupling results from small time steps and high Reynolds 
number flows, say x = 0(10“^) and Re = 0(10®), in which case 
experimental observation shows the elements of B, E are 0(10“^) 
compared to C, D elements, which are 0(10"®). Hence on a 14- 
digit machine such as the CDC-5600 the elements of C, D are 
definitely significant in the problem solution, but the cross 
coupling between the two sets of velocity variables is rather 
weak. 

All factors contributing to efficiency in execution of the 
block solver are present in the problem discussed. Finite element 
matrices are naturally sparse, and in the present case nonsymraetric 
and variable in bandwidth. The coupling is very weak, and the 
initial vector is the flow field one small timestep removed from 
the final solution vector, assuring rapid convergence. The B, E 
matrices are small perturbations from symmetric, positive definite 
matrices, so the LU decomposition applies. 

The test case presented in the present paper is characterised* 
by a system of n = 1086 equations, with the square matrices 
B, C, D, E (zz, zR, Rz, RR) nonsymmetric and variable band, 
but with all having identical intraband distributions of zero 
and non- zero elements. All matrices were profile stored (ref .17) , 
with space allowed for matrix fill in the decomposition of B, E. 
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The LU decomposition algorithm took no advantage of matrix si->arse" 
ness other than knowledge of profile ;(fill and variable bandwidth) . 
The semi-bandwidth* of each of B, C, D, E, if considered as 
fixed band matrices, was 24 . Thus# the semi-bandwidth for the 
composite system (A-l) would be 567. The matrices B, E were 
stored in-core, with C, D row stored on disk, one disk record 
being comprised of one row from C and one row from D. To solve 
such a system on the CDC-6600 computer required approximately 
12 seconds CPU time and 8 iterations to convergence initially, 
and approximately 11 seconds and 7 iterations after the time 
asymptotic solution had progressed several steps (showing the 
influence of the initial vector) . Iterative improvement (ref. 5) 
experimentation indicates 12 significant figures of accuracy (oh 
the 14"digit CDC-6600 series) for the solution. 


* A maiyrix of bandwidth 2m t 1 has semi-band? 7 idth m (integer) . 
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APPENDIX B 

A MESH GENERATION PROGRAM (MESHGN) 

INTRODUCTION 

Manual generation of a finite element mesh and mesh associated, 
data is most time consumingf for large grids the task becomes nearly 
impossible to accomplish accurately. For example, although modern 
computers can solve a plane stress problem (steady state) with a 
thousand elements in, say, under ten minutes, the division of 
the field into elements and the associated data preparation and 
checking may take several days. Moreover, one has no guarantee 
that the first trial run with the finite element numerics model 
will not uncover some irregularity in solution behavior requiring 
mesh refinement in portions of the mesh, if not over the whole 
mesh, requiring the job be repeated 1 Thus, an essential compon- 
ent for success of any finite element endeavor is the availability 
of an automatic mesh generation program. 

To overcome this obstacle, an automatic mesh generation 
routine has been programmed. Since the problem domain is rec- 
tangular, the structuring of such a program of this code appeared 
more feasible than obtaining and becoming familiar with the 
intricacies of other mesh generators on the market. 

The code developed triangulates a rectangular region, numbers 
the nodes associated with triangle vertices and centroid, and 
generates mesh-associated data arrays. Use of the code reduces 
mesh generation manual labor requirements to about two hours , for 
a large mesh having above 600 unknowns per dependent variable. 

Mesh Geometry 

The routine superimposes a rectangular grid work on the 
problem domain, as in finite differences, then divides each rec- 
tangle into triangles (see fig. 3j . In the numerical evaluation 
of boundary integral qua.dratures within the finite element algo r- 
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ithnir it is assumed that no triangle has more than one edge on 
the boundary of the region. Therefore , top right and bottom 
left rectangles are subdivided into triangles whose hypotenuse 
is the lower left to top right diagonal of the rectangle. The 
reverse situation applies for all other rectangles. The finite 
difference rectangles are formed by mesh intersections of the 
lines x = x. ,i=l, 2 , ... UX6 and y = y . r j = 1 f 2 , . NYS. 

1 J 

Some degree of irregularity in the mesh may be achieved by non- 
uniform spacing of the X. r y. subdivisions. For proper 
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functioning of the routine the minimum number of triangles that 
can be configured is eighteen; i.e. , NXG and NYG must each 
exceed three. 

MESH NUMBERING 
Triangle Numbering 

Triangles are numbered left to right and top to bottom, 

IT =1, 2, ... , M. Here IT designates triangle number, and 
M the total nuit±ier of triangles. 

Node Numbering 

The spatial variation of each dependent physical variable 
in the flow field is approximated by a finite element trial 
function. Each trial function Contains parameters approximating 
flow variable function and first partial derivative values at 
triangle vertices are referred to as multiple nodes , having 
three problem nodes associated with each. Triangle centroids 
are simple nodes . Nodes are numbered left to right and top to 
bottom, a level of multiple nodes followed by a level of simple 
nodes (see fig. 3). 

The two-dimensional array NODE (M, iD3) , M = the total number 
of nodes, indicates which geometrical nodes are associated with 
a particular triangle. Node numbers for triangle IT are stored 
in NODE (IT, J) , J = 1, 2, ..., 10. These ten node numbers are 
obtained by proceeding counterclockwise around each triangle. 
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procuring sequentially geometrical node ntimbers previously 
assigned ^ with, the centroidal node procured last* (For meaning- 
ful assembly of the finite element equations/ it is essential 
that a consistent choice of either counterclockwise on all 
triangles/ or clockwise on all/ be made.) Normally the starting 
vertex for a particular triangle is of no consequence. However/ 
in order that boundary integrals in the finite element numerics 
be properly computed/ the present program assumes the starting 
vertex for triangles with one edge on the boundary be interior 
to the problem domain. 

Variable Numbering 

Node numbers just discussed refer to a physical location in 
the problem domain. Now consider in turn each particular physical 
dependent flow variable/ such as density or a velocity component. 
There is associahet^ with each node a parameter of the trial 
function which approximates this flow variable . If a boundary 
condition applies at this node/ the parameter is a known variable; 
otherwise an unknown. Each category of variables is numbered 
separately/ and these numbers do not necessarily coincide with 
node numbers. 

Unknown variable numbering . Proceeding triangle by triangle/ 
and counterclockwise around the nodes of each/ the mesh generator 
assigns Sequentially unknown variable numbers of nodes where no 
boundary condition applies/ provided the node has not been num- 
bered already on a previous triangle. To minimize matrix fill 
in the resulting system of finite element equations/ the unknown 
variable numbers are then reverse ordered, as in the Reverse-Cuthill 
bandwidth minimization routine. It has been verified that this 
process gives near minimal profile for the system of equations. 

For example/ the bandwidth minimization program of Poole (tef. 15) 
did not yield as economical a matrix storage as the present process, 
although it produced slightly smaller maximum bandwidth. 

Known variable numbering . The only input to the mesh gener- 
ator as far as variable numbering is concerned is a list of the 
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nodes at which boundary conditions apply. These nodes are 
sequentially assigned known variable numbers. The input list 
of nodes need have no particular order. Of course, a separate 
node list for density and velocity boundary data is required. 

Variable number access . The array I0R(2, ID3) establishes 
the correspondence between known or unknown variable and the 
physical node with which the variable associates For example, 
the variable number associated with node I is obtained from 
the FORTRAN statement 

IV = I0R(IP, I) . 

For IP = 2, IV is a velocity variable mamber; for IP = 1, 
it is the number of a continuity variable. 

Variable nimnber discrimination . The discrimination of whether 
a variable number access yields the number of a known or an, unknown 
variable is determined from bit settings in the array FIiA6(M) . 

If the node associated with the variable is on triangle IT, the 
bit settings of the location FLAG (IT) determine whether the 
variable is known or unloiown. The bit settings in location IT 
of the array FLAG in general display information concerning 
triangle IT and variables associated with it. 

Slack variables . Slack variables , which are defined belov;, 
are employed in the velocity components to assure that velocity 
variables associated with a particular node are both known or 
both unknown. For example, if one component is known and the 
other unknown, the equation 

X = known value 

is assembled for the known variable, and it is otherwise treated 
as unknown throughout the equation assembly and equation solving 
process. This avoids separate variable lists between velocity 
components, and leads, to determining equations for momentum 
variables identical in profile for the two sets pf moraenttua 
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matrices. Bookkeeping in the iquation solving is greatly simpli- 
fied. Throughout mesh generation it is automatically assumed that 
velocity variables at a node are both known or both unknown; 
node numbers of slack variables are input to the finite element 
numerics program/ and after equation assembly the proper equations 
are modified to produce slack variables. 

Triangle Coordinates 

Triangle coordinates are created by MESHGEN and stored in 
an array C00RD (M, 2/4); x-.y coordinates of all Vertices and the 
centroid. This three-dimensional array is not used in the finite 
element numerics algorithm directly; coordinate associated quan- 
tities such as the Jacobian of the affine mapping which transforms 
a particular triangle onto the standard triangle with vertices 
(0/0)/ (1/ -1)/ and (1/1) (used in evaluating numerical quadra- 

tures) are computed and passed along. 

MESH ASSOCIfiTED DATA ARRAYS 

The arrays RODE/ COORD/ FLAG/ lOR discussed in the 
preceding section , automatically generated by MESHGEN / are the 
output which one would normally ascribe to a mesh generator. 

Before MESHGH was coded, the manual generation and checking of 
these arrays, for a 103-triangle mesh, required over a week, 
with no guarantee of accuracy. The availability of this data 
permits computation of other mesh associated data discussed in 
the sequel, such as band structure and storage information for 
system matrices . 

In-core Storage of System Matrices 

As previously indicated, the finite element system matrices 
are the density matrix, D, and the velocity matrices zz, 2R; 

Rz, RR. These matrices are variable band, of sighif leant maximum 
bandwidth, sparse within the band, and iionsymmetric. However, 
all matrices have symmetric profile; i,e., if matrix element 
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a. . is nbnzero, so is a... The use of slack variables forces 
xj 31 

all velocity matrices to be of the same dimension and have identical 
profile, leading to simpler bookkeeping. The profile matrix stor- 
age scheme used accounts for the variable bandwidth, but does not 
utilize intra-band matrix sparseness. 

Let R. be a vector whose elements a., are the elements 

of a matrix A which occur in row i, starting with the first 

ot 

nonzero element and ending with the last nonzero element. The 
matrix is then stored as a one-dimensional array by stacking 
end-to-end the vectors R^^. 

Matrix elements of D, zz, RR are accessed by storing 
locations of the diagonal elements in the array JT(2,N) N the 
maximum dimension of D, zz. The code 

IP = JT(I,K) + J - K 


obtains the location of element a^^j 
the corresponding element location in 


of D , for I 
zz,RR, if I 



and 


In performing the LU decompositions of D, zz, RR it is 
necessary to know the left and right semi-bandwidths, or the 
number of elements in a matrix row to the left and right of the 
diagonal*. This information is stored in the arrays IBS(2,W) 
and IBL(2,N) . Here 


IP = TBS (I,J) 


aGcesses density information if I = 1 and velocity information 
if I = 2. This convention holds for arrays IBS, IBL, JT. 

The mesh generator computes the maximum bandwidth for each 
matrix row, adjusts the bandwidth to allow for any matrix fill 
occurring in the decompositions of D, zz, RR, and computes 
diagonal element locations (JT) and semi-bandwidth (IBS, IBL) 
infoannation reflecting allowance for matrix fill. 


Disk Storage of System Matrices 

Two optimized versions of the original finite element ntnnerics 
program produced under contract NASl-11707-37 have been programmed. 
Version I, consisting of some minor design alterations and pro- 
gram optimization by personnel of Computer Sciences Corporation, 
is structured so that only one of the matrices D, zz, zR, Rz, 
RR occupies core at one time. As equation assembly proceeds, 
the density and velocity local stiffness matrices for each element 
are assembled; each partially assembled global matrix is read 
into core in turn; the local equation contributions are globally 
distributed; and the matrix is written back to disk. As lower 
semi-bandwidths for each row of zz, zR, Rz, RR are identical, 
the IBS array as described above serves all. However, since no 
matrix fill in Rz , zR need be accounted for in the equation 
solving, upper semi-bandwidths may differ between rows of zz, RR 
and zR, Rz. This necessitates the keeping of arrays JTV(W), 
IBLV(N) , similar in structure to JT and IBL except for fill 
allowance, which indicates diagonal element storage and upper 
semi-bandwidth information for zR and Rz. 

Version II, with design alterations by the authors, differs 
in structure from version I. Here continuity assembly and solu- 
tion completes, with no disk storage of D, before momentum 
equation assembly starts. During momentum assembly, zz and RR 
are built up in-core, with local element stiffness matrices for 
zR, Rz written to disk, triangle by triangle, each record of 
length 200 words. In-core matrices are then written to disk, 
element stiffness matrices are read, and zR, Rz are assembled 
in the space vacated by RR, zz. The matrices Zr, Rz are 
written to disk by rows, each record containing one row each 
of zR and Rz. The arrays IBS, IBC, JT, IBLV above 
described apply in version II as well. However, since zR and 
Rz must be assembled en toto prior to any disk write, the JTV 
array now used locates only the diagonal element of zR, with 
the corresponding row of Rz stacked end-to-end with it. The 
row records of zR, Rz are read as needed in the block iterative 
solution of the momentum equations. 
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Time Invariant Data 


In the finite element numerics program B-spline function 
and derivative values/ as well as area integrals of combinations 
of such/ are needed at each time step / but do not change with 
time instant. These B-spline quantities and time invariant 
integrals are computed/ for each triangle/ in the mesh generator/ 
to be read from disk as needed in the time asymptotic computation. 
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Figure 2.“ Boundary conditions on computational domain. 
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Figure 5. - Velocity vector plots for the AD! gridwork. t Interpolated using 


FEfA solution model. ) 
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(a) X = 0. 075 


Steady state results, FEM solution. 
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Figure 6.- Concluded. 
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(b) Streamwise velocity. 
Figure 10 .- Continued. 
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